On Employing a Highly Mismatched Crowd for Speech Transcription

نویسندگان

Purushotam G. Radadia

Rahul Kumar

Kanika Kalra

Shirish Subhash Karande

Sachin Lodha

چکیده

Crowd sourcing provides a cheap and fast way to obtain speech transcriptions. The crowd size available for a task is inversely proportional to the skill requirements. Hence, there has been recent interest in studying the utility of mismatched crowd workers, who provide transcriptions even without knowing the source language. Nevertheless, these studies have required that the worker be capable of providing a transcription in Roman script. We believe that if the script constraint is removed, then countries like India can provide significantly larger crowd base. With this as a motivation, in this paper, we consider transcription of spoken Russian words by a rural Indian crowd that is unfamiliar with Russian and has very limited knowledge of English. The crowd we employ knew Gujarati, Marathi, Telugu and used the scripts of these languages to provide their transcriptions. We utilized an insertion-deletion-substitution channel to model the transcription errors. With a parallel channel model we can easily combine the crowd inputs. We show that the 4 transcriptions in Indic scripts (2 Gujarati, 1 Marathi, 1 Telugu) provide an accuracy of 73.77 (vs. 47% for ROVER algorithm) and a 4-best accuracy of 86.48%, even without employing any worker filtering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feasibility of Post-Editing Speech Transcriptions with a Mismatched Crowd

Manual correction of speech transcription can involve a selection from plausible transcriptions. Recent work has shown the feasibility of employing a mismatched crowd for speech transcription. However, it is yet to be established whether a mismatched worker has sufficiently fine-granular speech perception to choose among the phonetically proximate options that are likely to be generated from th...

متن کامل

Transcribing continuous speech using mismatched crowdsourcing

Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech...

متن کامل

Automatic Spontaneous Speech Grading: A Novel Feature Derivation Technique using the Crowd

In this paper, we address the problem of evaluating spontaneous speech using a combination of machine learning and crowdsourcing. Machine learning techniques inadequately solve the stated problem because automatic speakerindependent speech transcription is inaccurate. The features derived from it are also inaccurate and so is the machine learning model developed for speech evaluation. To addres...

متن کامل

Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka

In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spok...

متن کامل

Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition

It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speak...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

On Employing a Highly Mismatched Crowd for Speech Transcription

نویسندگان

چکیده

منابع مشابه

Feasibility of Post-Editing Speech Transcriptions with a Mismatched Crowd

Transcribing continuous speech using mismatched crowdsourcing

Automatic Spontaneous Speech Grading: A Novel Feature Derivation Technique using the Crowd

Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka

Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition

عنوان ژورنال:

اشتراک گذاری